Skip to content

[v1.x] fix: handle ClosedResourceError when transport closes mid-request#2334

Merged
maxisbey merged 2 commits intomodelcontextprotocol:v1.xfrom
owendevereaux:fix/v1x-closed-resource-error-2328
Mar 24, 2026
Merged

[v1.x] fix: handle ClosedResourceError when transport closes mid-request#2334
maxisbey merged 2 commits intomodelcontextprotocol:v1.xfrom
owendevereaux:fix/v1x-closed-resource-error-2328

Conversation

@owendevereaux
Copy link
Copy Markdown

Summary

Backports fixes from #2306 to v1.x to address issue #2328.

When the transport closes while handlers are processing requests (e.g., stdin EOF during a long-running tool call, or a malicious client sending invalid UTF-8 bytes that crashes the transport), the server could crash with ClosedResourceError when trying to send a response through the already-closed write stream.

Changes

  1. Wrap message loop in try/finally with cancel - When the transport closes, in-flight handlers are now cancelled instead of being left to attempt responding through a closed stream
  2. Catch ClosedResourceError on respond - If a response attempt fails due to closed transport, log and return gracefully instead of crashing
  3. Fix cancellation handling - Distinguish between client-initiated cancellation (already responded) and transport-close cancellation (should re-raise)
  4. Fix dict iteration in finally - Use list() snapshot when iterating _response_streams to avoid 'dictionary changed size during iteration' errors

Testing

Added two new test cases that reproduce the crash scenarios:

  • test_server_cancels_in_flight_handlers_on_transport_close
  • test_server_handles_transport_close_with_pending_server_to_client_requests

All existing tests continue to pass.

Fixes #2328

owendevereaux and others added 2 commits March 22, 2026 17:24
Backports fixes from #2306 to v1.x to address issue #2328.

When the transport closes while handlers are processing requests (e.g., stdin
EOF during a long-running tool call), the server could crash with
ClosedResourceError when trying to send a response through the already-closed
write stream.

This fix:
1. Wraps the message loop in a try/finally that cancels in-flight handlers
   when the transport closes, preventing them from attempting to respond
2. Catches BrokenResourceError and ClosedResourceError when calling
   message.respond() and logs instead of crashing
3. Properly re-raises transport-close cancellation to let the task group
   handle it (vs client-initiated cancellation which already sent a response)
4. Uses list() snapshot when iterating _response_streams in the finally block
   to avoid 'dictionary changed size during iteration' errors

Fixes #2328
- Hoist SessionMessage/types imports from function bodies to module level
- Add dict[str, Any] type arguments to satisfy pyright strict mode
Copy link
Copy Markdown
Contributor

@maxisbey maxisbey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — faithful backport of #2306. Pushed a minor fix (b08d7fc) to hoist inline imports to module level and add dict[str, Any] type args for pyright.

AI Disclaimer

@maxisbey maxisbey enabled auto-merge (squash) March 24, 2026 22:22
@maxisbey maxisbey merged commit 6524782 into modelcontextprotocol:v1.x Mar 24, 2026
21 checks passed
@bbarwik
Copy link
Copy Markdown

bbarwik commented Apr 9, 2026

@owendevereaux @maxisbey This issue was not correctly fixed. Bellow is report what is wrong, I'll add issue for it too:

Bug: AssertionError: Request already responded to — cancellation race in v1.27.0

AssertionError: Request already responded to when CancelledNotification arrives after handler completes but before respond()

Description

When a client sends a notifications/cancelled for a request whose handler has already finished executing but hasn't yet called message.respond(), the server crashes with AssertionError: Request already responded to.

PR #2334 (v1.27.0) fixed the ClosedResourceError crash path by catching CancelledError in _handle_request and guarding respond() against BrokenResourceError/ClosedResourceError. However, it left a race window between handler completion and respond() where a cancellation notification can set _completed = True first, causing the assert on line 129 of session.py to fire.

Environment

  • mcp 1.27.0
  • Python 3.14
  • anyio (asyncio backend)
  • FastMCP stdio transport
  • Client: Claude Code 2.1.92

Reproduction scenario

  1. Client sends a tools/call request with a long-running handler (e.g. polling with 600s timeout)
  2. Handler completes and returns a result
  3. Between the handler's return and the await message.respond(response) call in _handle_request, the client sends notifications/cancelled for that same request ID
  4. The cancellation notification handler (session.py:403-406) calls responder.cancel(), which:
    • Calls _cancel_scope.cancel()
    • Sets _completed = True
    • Sends an error response "Request cancelled"
  5. Back in _handle_request, execution reaches await message.respond(response) at server.py:800
  6. respond() hits assert not self._completed at session.py:129crash

The cancel_scope.cancel() only raises CancelledError if the task is currently in an await. Since the handler already returned, the code path between the handler return and respond() is synchronous — no checkpoint where CancelledError can be delivered. The except anyio.get_cancelled_exc_class() guard at server.py:773 never fires.

Stack trace

ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
    +-+---------------- 1 ----------------
      | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
      +-+---------------- 1 ----------------
        | Traceback (most recent call last):
        |   File "mcp/server/lowlevel/server.py", line 703, in _handle_message
        |     await self._handle_request(message, req, session, lifespan_context, raise_exceptions)
        |   File "mcp/server/lowlevel/server.py", line 800, in _handle_request
        |     await message.respond(response)
        |   File "mcp/shared/session.py", line 129, in respond
        |     assert not self._completed, "Request already responded to"
        | AssertionError: Request already responded to
        +------------------------------------

Analysis

The race window in _handle_request (server.py:719):

# Line 770: handler completes, returns response
response = await handler(req)

# ... exception handling ...

# Line 799-800: GAP — between handler return and respond(),
# a CancelledNotification can arrive on another task and call
# responder.cancel(), setting _completed = True and sending
# an error response. No await in this gap means no CancelledError
# can be delivered.
try:
    await message.respond(response)  # <-- assert fires here
except (anyio.BrokenResourceError, anyio.ClosedResourceError):
    ...

The except anyio.get_cancelled_exc_class() at line 773 correctly handles the case where the cancellation arrives during handler execution. But it cannot handle cancellation that arrives after the handler returns, because there's no async checkpoint between the handler return and respond().

Suggested fix

In session.py, change respond() to handle the already-completed case gracefully instead of asserting:

async def respond(self, response: SendResultT | ErrorData) -> None:
    if not self._entered:
        raise RuntimeError("RequestResponder must be used as a context manager")
    
    # If already completed (e.g. by a concurrent cancellation), skip silently.
    if self._completed:
        return

    if not self.cancelled:
        self._completed = True
        await self._session._send_response(
            request_id=self.request_id, response=response
        )

Alternatively, the guard could be added in _handle_request before calling respond():

if not message._completed:
    try:
        await message.respond(response)
    except (anyio.BrokenResourceError, anyio.ClosedResourceError):
        logger.debug("Response for %s dropped - transport closed", message.request_id)

The first approach (in respond() itself) is more robust since it closes the race for all callers.

Impact

This crashes the entire MCP server process, killing all in-flight requests. In our case, the server manages multiple long-running background agents, so a crash loses all active work. The crash is triggered by normal client behavior (user cancels an operation), making it a reliability issue rather than an edge case.

Related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants